Goto

Collaborating Authors

 national library


Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition

arXiv.org Artificial Intelligence

This work presents a suite of fine-tuned Whisper models for Swedish, trained on a dataset of unprecedented size and variability for this mid-resourced language. As languages of smaller sizes are often underrepresented in multilingual training datasets, substantial improvements in performance can be achieved by fine-tuning existing multilingual models, as shown in this work. This work reports an overall improvement across model sizes compared to OpenAI's Whisper evaluated on Swedish. Most notably, we report an average 47% reduction in WER comparing our best performing model to OpenAI's whisper-large-v3, in evaluations across FLEURS, Common V oice, and NST.


Visual Navigation of Digital Libraries: Retrieval and Classification of Images in the National Library of Norway's Digitised Book Collection

arXiv.org Artificial Intelligence

Digital tools for text analysis have long been essential for the searchability and accessibility of digitised library collections. Recent computer vision advances have introduced similar capabilities for visual materials, with deep learning-based embeddings showing promise for analysing visual heritage. Given that many books feature visuals in addition to text, taking advantage of these breakthroughs is critical to making library collections open and accessible. In this work, we present a proof-of-concept image search application for exploring images in the National Library of Norway's pre-1900 books, comparing Vision Transformer (ViT), Contrastive Language-Image Pre-training (CLIP), and Sigmoid loss for Language-Image Pre-training (SigLIP) embeddings for image retrieval and classification. Our results show that the application performs well for exact image retrieval, with SigLIP embeddings slightly outperforming CLIP and ViT in both retrieval and classification tasks. Additionally, SigLIP-based image classification can aid in cleaning image datasets from a digitisation pipeline.


Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

arXiv.org Artificial Intelligence

Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, a number of large language models (LLMs) have recently been adapted to the medical domain, so that they can be used as a tool for mediating in human-AI interaction. While these LLMs display competitive performance on automated medical texts benchmarks, they have been pre-trained and evaluated with a focus on a single language (English mostly). This is particularly true of text-to-text models, which typically require large amounts of domain-specific pre-training data, often not easily accessible for many languages. In this paper, we address these shortcomings by compiling, to the best of our knowledge, the largest multilingual corpus for the medical domain in four languages, namely English, French, Italian and Spanish. This new corpus has been used to train Medical mT5, the first open-source text-to-text multilingual model for the medical domain. Additionally, we present two new evaluation benchmarks for all four languages with the aim of facilitating multilingual research in this domain. A comprehensive evaluation shows that Medical mT5 outperforms both encoders and similarly sized text-to-text models for the Spanish, French, and Italian benchmarks, while being competitive with current state-of-the-art LLMs in English.


Boosting Norwegian Automatic Speech Recognition

arXiv.org Artificial Intelligence

In this paper, we present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokm{\aa}l and Nynorsk. We compare the performance of models of varying sizes and pre-training approaches on multiple Norwegian speech datasets. Additionally, we measure the performance of these models against previous state-of-the-art ASR models, as well as on out-of-domain datasets. We improve the state of the art on the Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of 17.10\% to 7.60\%, with models achieving 5.81\% for Bokm{\aa}l and 11.54\% for Nynorsk. We also discuss the challenges and potential solutions for further improving ASR models for Norwegian.


Artificial intelligence uncovers lost work by titan of Spain's 'Golden Age'

The Guardian

Lost or misattributed works by some of the finest writers of Spain's Golden Age could be discovered thanks to pioneering AI technology that has been used to identify a previously unknown play by the wildly prolific dramatist, poet, sailor and priest Lope de Vega. This week Spain's National Library announced that researchers trawling its massive archive had stumbled upon and verified a play that Lope is believed to have written a few years before his death in 1635. Like many plays of the Spanish Golden Age – the 16th- and 17th-century cultural boom that accompanied Spain's imperial growth and which birthed masterpieces by Lope, Cervantes, Calderón and Velázquez, among many others – La francesa Laura (The Frenchwoman Laura) is a tale of love, jealousy and social hierarchy in which suspicion demands an innocent woman be sacrificed on the altar of her husband's honour. But, unlike many similar plays of the period, Laura survives and the third act ends happily. Equally unusual was the manner of the play's discovery.


Emerging Trends in AI, Ultrasound and OB/GYN Care

#artificialintelligence

From entertainment to commerce, artificial intelligence (AI) is making a difference in many aspects of life and has the power to advance diagnostics. Next-generation healthcare technology has begun implementing many AI-powered tools to improve efficacy and patient safety, and enhance the clinician experience.1 There are several image acquisition and analysis capabilities that can be enhanced by an AI application for each task.2 Nearly every woman requires an ultrasound at some point during their care. There is huge potential for AI to assist in repetitive tasks and provide promising workload-changing advances with the use of ultrasound in obstetrics and gynecologic (OB/GYN) care.2


The Norwegian Parliamentary Speech Corpus

arXiv.org Artificial Intelligence

The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset with recordings of meetings from Stortinget, the Norwegian parliament. It is the first, publicly available dataset containing unscripted, Norwegian speech designed for training of automatic speech recognition (ASR) systems. The recordings are manually transcribed and annotated with language codes and speakers, and there are detailed metadata about the speakers. The transcriptions exist in both normalized and non-normalized form, and non-standardized words are explicitly marked and annotated with standardized equivalents. To test the usefulness of this dataset, we have compared an ASR system trained on the NPSC with a baseline system trained on only manuscript-read speech. These systems were tested on an independent dataset containing spontaneous, dialectal speech. The NPSC-trained system performed significantly better, with a 22.9% relative improvement in word error rate (WER). Moreover, training on the NPSC is shown to have a "democratizing" effect in terms of dialects, as improvements are generally larger for dialects with higher WER from the baseline system.


How machine learning is bringing National Library of Scotland's maps to life

#artificialintelligence

What if machine learning meant that you didn't have to have a definitive starting point and the reams of records in the archives could be explored and enjoyed visually? That is the vision of Martin Disley who has been creating datasets from across the National Library of Scotland's (NLS) map collection. His project, which is part of the Creative Informatics Resident Entrepreneur project at the University of Edinburgh, curated datasets of images previously scanned by the NLS to feed a machine learning model.The newly-created machine learning model then creates'fake' versions of the images that it is trained upon. The generated output from this process can be animated to produce visions of machines dreaming, in this case the fake maps animated and brought to life. This has the effect of synthesising these large collections down in short videos.


Fantastic Futures 2019 Conference

#artificialintelligence

Stanford Libraries will host the 2nd International Conference on AI for Libraries, Archives, and Museums over three days, December 4, 5 & 6, 2019. The first'Fantastic Futures' conference, which took place in December 2018 at the National Library of Norway in Oslo, initiated a community-focused approach to addressing the challenges and possibilities for libraries, archives, and museums in the era of artificial intelligence. The Stanford conference will expand that charge, adding to the plenary gathering a full day of workshops and a half day'unconference' shaped by the interests of those assembled. Wednesday, December 4, will be a day of plenary sessions to introduce attendees to a range of topics in AI, from the concerns of algorithmic bias and data privacy to the exciting developments in transforming discovery and digital content curation (see the full program). The two keynote addresses reflect Stanford Library's position as an academic center in close proximity to Silicon Valley: Bryan Catanzaro, the Vice President of Applied Deep Learning at Nvidia, will speak to the important contribution he thinks libraries can make in AI.


Medication Management System That Uses AI To Help Doctors Treat At-Risk Patients Better

International Business Times

Poor adherence is a widespread medical problem, which has poor health outcomes and inflates healthcare costs. According to the U.S. National Library of Medicine, 75 percent of Americans face trouble taking medicine as instructed by their doctors. Israeli personalized medication management platform, Medisafe, wants to change this using artificial intelligence (AI). The start-up uses AI and machine learning on its medication adherence platform. It passively collects data from patients, such as medications prescribed, health measurements and uses self-learning algorithms, which can help a patient adhere to instructed medication better.